Jack Davis
2024-06-13
## No renderer backend detected. gganimate will default to writing frames to separate files
## Consider installing:
## - the `gifski` package for gif output
## - the `av` package for video output
## and restarting the R session
For theory, see “A Layered Grammar of Graphics” by Hadley Wickham preprint at http://vita.had.co.nz/papers/layered-grammar.pdf
(See also “Introducing the Grammar of Graphics Plotting Concept” by Science Craft, found at https://www.science-craft.com/2014/07/08/introducing-the-grammar-of-graphics-plotting-concept/ )
For the user: This grammar makes it easier to iteratively update a plot, changing a single feature at a time.
The grammar is also useful because it suggests the high-level aspects of a plot that can be changed, giving us a framework to think about graphics, and hopefully shortening the distance from mind to paper.
It also encourages the use of graphics customized to a particular
problem rather than generic graphics. (Instead of being stuck with
hist(), you can make a graph that has a
geom_hist layer and see what you want to add from
there.
For the developer, it makes it much easier to create new capabilities. You only need to add the one component that you need, and you can continue to use all the other existing components.
For example, you can add a new statistical transformation, and continue to use the existing scales and geoms. It is also useful for discovering new types of graphics, as the grammar defines the parameter space of statistical graphics.
Layers are responsible for creating the objects that we perceive on the plot. A layer is composed of four parts:
Usually all the layers on a plot have something in common, typically that they are different views of the same data, for example, a scatterplot with an overlaid smoother.
Some statistical transformations provided by ggplot2. To learn more,
try geom_bin, geom_boxplot,
stat_smooth, etc.
| Transformation | Description |
|---|---|
| bin | Divide continuous range into bins, and count number of points in each |
| boxplot | Compute statistics necessary for boxplot |
| contour | Calculate contour lines |
| density | Compute 1-D density estimate |
| identity | Identity transformation, f (x) = x |
| jitter | Jitter values by adding small random value |
| Calculate values for quantile-quantile plot | |
| quantile | Quantile regression |
| smooth | Smoothed conditional mean of y given x |
| summary | Aggregate values of y for given x |
| unique | Remove duplicated observations |
Every geom has a default statistic, and
every statistic a default geom.
For example, the bin statistic defaults to using the
bar geom to produce a histogram.
Overriding these defaults will still produce a valid plot, but it may violate graphical conventions.
Each geom can only display certain aesthetics.
A point geom has position, color, shape, and size aesthetics.
A bar geom has position, height, width, and fill color.
A scale controls the mapping from data to aesthetic attributes, and so we need one scale for each aesthetic property used in a layer.
Scales are common across layers to ensure a consistent mapping from data to aesthetics.
A scale is a function, and its inverse, along with a set of parameters. For example, the color gradient scale maps a segment of the real line to a path through a color space.
Consider this classic historical graph depicting Napoleon’s assault on, and retreat from, Russia. We can interpret it as a graphic with a few layers. (Let’s ignore the bottom part.)
In this graph, the thickness (size) of the line is the number of remaining troops. The (x,y) location of the center of the line is the longitude and latitude of the location of the troops. The colour of the line represents the direction of the troops (brown for advancing, black for retreating). That gives us four variables
( Plotting data ‘troops’ from Wilkinson, L. (2005), The Grammar of Graphics (2nd ed.). )
We can make a ggplot with a default
aesthetic of the troops dataset, and
longitude and latitude as the x and y,
respectively.
Add a layer using the geom_path shorthand, where
size is mapped to the number of surviving soldiers,
color is mapped to the direction.
group is also mapped to a group ID in the troops data.
Why? During the campaign there were some side armies that joined into or
split out of the main army. Without the group setting, the path made by
geom_path would be one continuous line instead of different
several line segments that branch off from each other.
While the geom_path layer gives a set of line segments,
it still lacks context. We can add a text layer by adding onto the
original plot_troops, thereby inheriting settings of the
ggplot in plot_troops, like the dataset and
the aes().
Top: plot_troops, bottom: plot_both.
The default colours look like vintage children’s toothpaste. If we
add settings, the most recently added settings
(scale_color_manual) override the older ones.
plot_polished <- plot_both +
scale_size(to = c(1, 10), # Make the line width diffs starker
breaks = c(1, 2, 3) * 10^5, # Clean legend
labels = comma(c(1, 2, 3) * 10^5)) + # Clean legend
scale_color_manual(values = c("grey50","red")) + # Goth toothpaste
xlab(NULL) + ylab(NULL) # No 'lat' and 'long' labelsplot_polished
Knowing what data goes with what visualization is a science. If you can’t find an example online of someone using similarly formatted data in a similar fashion, then have a second look and see if what you’re planning to do makes sense. (See Stat 442/842 - Data Visualization)
Some very general sources of GGplot examples and material are the R Graphics Cookbook (https://r-graphics.org/) and the R Graph Gallery (https://r-graph-gallery.com/).
Also, the Grammar of Tables has an introduction here https://gt.rstudio.com/ , for visualizations that are NOT just graphs.
Tanya Shapiro takes popular infographic images and recreates them with ggplot, which allows you add your own data and twists.
https://github.com/tashapiro/tanya-data-viz/
See also the visualizations with tutorials at Andrew Weatherman’s page here: https://viz.aweatherman.com/viz/
Also included are gt (Grammar of Tables) visualizations.
See also the vizzes of Tony El Habr. He has dozens available through his Github https://github.com/tonyelhabr/sports_viz
https://github.com/tonyelhabr/sports_viz/blob/master/81-2023_mls_g_minus_xg/1-main.R
{width = 80%}
Like everything in GGplot, we can draw the fields of various sports with geoms.
geom_football("nfl")
geom_football("nfl", display_range = "red zone")
geom_baseball("mlb")
geom_baseball("mlb", display_range = "infield")
geom_soccer("fifa")
geom_soccer("fifa",
pitch_updates = list(
pitch_length = 100,
pitch_width = 75))
geom_basketball("nba", display_range = "offense", rotation = 270)
geom_volleyball(league = "NCAA", rotation = 270, display_range = "offense")
# See https://sportyr.sportsdataverse.org/
# because ?geom_lacrosse gives bad league suggestions
geom_lacrosse(league = "NLL", field_units = "ft")
geom_tennis(league = "USTA", rotation = 270, display_range = "serving")# See https://sportyr.sportsdataverse.org/
# because ?geom_lacrosse gives bad league suggestions
geom_lacrosse(league = "NLL", field_units = "ft") Tennis, without the beach.
(Tennis with the beach: http://blog.prospin.com.br/wp-content/uploads/2021/04/beach-tenis.jpg)
Here is the code for a similar function for plotting a hockey rink from scratch using really basic ggplot elements (lines and points). This comes from the OTTHAC tutorial for Big Data Cup 2021, found here: https://github.com/bigdatacup/Big-Data-Cup-2021/
Notice specifically all the geom_circle,
geom_point and geom_segment calls.
# Create rink plot function
plot_rink = function(p_object){
require(ggforce)
require(cowplot)
require(tidyverse)
upper_outline = data.frame(
x = c(
115,
172 + 28*sin(seq(0,pi/2,length=20)),
172 + 28*sin(seq(pi/2,0,length=20)),
115
),
y = c(
0,
0 + 28 - 28*cos(seq(0,pi/2,length=20)),
85 - 28 + 28*cos(seq(pi/2,0,length=20)),
85
)
)
lower_outline = data.frame(
x = c(
115,
100-72 - 28*sin(seq(0,pi/2,length=20)),
100-72 - 28*sin(seq(pi/2,0,length=20)),
115
),
y = c(
0,
0 + 28 - 28*cos(seq(0,pi/2,length=20)),
85 - 28 + 28*cos(seq(pi/2,0,length=20)),
85
)
)
p = p_object +
## FACEOFF CIRCLES ##
geom_circle(data = data.frame(x0 = 100, y0 = 42.5, r = 15), aes(x0 = x0, y0 = y0, r = r), lwd = 0.5, col = "gray50", inherit.aes = FALSE) +
geom_circle(data = data.frame(x0 = 169, y0 = 20.5, r = 15), aes(x0 = x0, y0 = y0, r = r), lwd = 0.5, col = "gray50", inherit.aes = FALSE) +
geom_circle(data = data.frame(x0 = 169, y0 = 64.5, r = 15), aes(x0 = x0, y0 = y0, r = r), lwd = 0.5, col = "gray50", inherit.aes = FALSE) +
geom_circle(data = data.frame(x0 = 31, y0 = 64.5, r = 15), aes(x0 = x0, y0 = y0, r = r), lwd = 0.5, col = "gray50", inherit.aes = FALSE) +
geom_circle(data = data.frame(x0 = 31, y0 = 20.5, r = 15), aes(x0 = x0, y0 = y0, r = r), lwd = 0.5, col = "gray50", inherit.aes = FALSE) +
## FACEOFF DOTS ##
geom_point(inherit.aes = FALSE, aes(y = 42.5, x = 100), col = "gray50", size = 1) +
geom_point(inherit.aes = FALSE, aes(y = 20.5, x = 169), col = "gray50", size = 1) +
geom_point(inherit.aes = FALSE, aes(y = 64.5, x = 169), col = "gray50", size = 1) +
geom_point(inherit.aes = FALSE, aes(y = 20.5, x = 120), col = "gray50", size = 1) +
geom_point(inherit.aes = FALSE, aes(y = 64.5, x = 120), col = "gray50", size = 1) +
geom_point(inherit.aes = FALSE, aes(y = 20.5, x = 31), col = "gray50", size = 1) +
geom_point(inherit.aes = FALSE, aes(y = 64.5, x = 31), col = "gray50", size = 1) +
geom_point(inherit.aes = FALSE, aes(y = 20.5, x = 80), col = "gray50", size = 1) +
geom_point(inherit.aes = FALSE, aes(y = 64.5, x = 80), col = "gray50", size = 1) +
## BLUE AND RED LINES ##
annotate("segment", col = "gray50", x = 75, xend = 75, y = 0, yend = 85, lwd = 0.5) +
annotate("segment", col = "gray50", x = 100, xend = 100, y = 0, yend = 85, lwd = 0.5) +
annotate("segment", col = "gray50", x = 125, xend = 125, y = 0, yend = 85, lwd = 0.5) +
## NET AND GOAL LINE ##
geom_segment(col = "gray50", inherit.aes = FALSE, lwd = 0.5, aes(y = 79.25, x = 11, yend = 5.75, xend = 11)) +
geom_segment(col = "indianred", inherit.aes = FALSE, lwd = 0.5, aes(y = 39.5, x = 7.5, yend = 45.5, xend = 7.5)) +
geom_segment(col = "indianred", inherit.aes = FALSE, lwd = 0.5, aes(y = 39.5, x = 7.5, yend = 39.5, xend = 11)) +
geom_segment(col = "indianred", inherit.aes = FALSE, lwd = 0.5, aes(y = 45.5, x = 7.5, yend = 45.5, xend = 11)) +
geom_segment(col = "gray50", inherit.aes = FALSE, lwd = 0.5, aes(y = 5.75, x = 189, yend = 79.25, xend = 189)) +
geom_segment(col = "indianred", inherit.aes = FALSE, lwd = 0.5, aes(y = 39.5, x = 192.5, yend = 45.5, xend = 192.5)) +
geom_segment(col = "indianred", inherit.aes = FALSE, lwd = 0.5, aes(y = 39.5, x = 192.5, yend = 39.5, xend = 189)) +
geom_segment(col = "indianred", inherit.aes = FALSE, lwd = 0.5, aes(y = 45.5, x = 192.5, yend = 45.5, xend = 189)) +
## OUTLINE ##
geom_path(data = upper_outline, aes(x = x, y = y), colour = "gray80", inherit.aes = FALSE, lwd = 0.5) +
geom_path(data = lower_outline, aes(x = x, y = y), colour = "gray80", inherit.aes = FALSE, lwd = 0.5) +
## ADDITIONAL SPECS ##
scale_x_continuous(expand = c(0, 0), limits = c(0,200)) + scale_y_continuous(expand = c(0,0), limits = c(0,85)) +
coord_fixed() +
theme_void()
return(p)
}Taken from: https://sportyr.sportsdataverse.org/articles/plotting-tracking-data.html
Getting the data
# Change names of X Coordinate and Y Coordinate to x and y respectively
names(bdc_data)[13:14] <- c("x", "y")
names(bdc_data)[20:21] <- c("x2", "y2")
# Preview what the data looks like
kable(head(bdc_data))| game_date | Home Team | Away Team | Period | Clock | Home Team Skaters | Away Team Skaters | Home Team Goals | Away Team Goals | Team | Player | Event | x | y | Detail 1 | Detail 2 | Detail 3 | Detail 4 | Player 2 | x2 | y2 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2021-01-23 | Minnesota Whitecaps | Boston Pride | 1 | 20:00 | 5 | 5 | 0 | 0 | Boston Pride | Jillian Dempsey | Faceoff Win | 100 | 43 | Backhand | Stephanie Anderson | NA | NA | |||
| 2021-01-23 | Minnesota Whitecaps | Boston Pride | 1 | 19:58 | 5 | 5 | 0 | 0 | Boston Pride | McKenna Brand | Puck Recovery | 107 | 40 | NA | NA | |||||
| 2021-01-23 | Minnesota Whitecaps | Boston Pride | 1 | 19:57 | 5 | 5 | 0 | 0 | Boston Pride | McKenna Brand | Zone Entry | 125 | 28 | Carried | Maddie Rowe | NA | NA | |||
| 2021-01-23 | Minnesota Whitecaps | Boston Pride | 1 | 19:55 | 5 | 5 | 0 | 0 | Boston Pride | McKenna Brand | Shot | 131 | 28 | Snapshot | On Net | t | f | NA | NA | |
| 2021-01-23 | Minnesota Whitecaps | Boston Pride | 1 | 19:53 | 5 | 5 | 0 | 0 | Boston Pride | Tereza Vanisova | Faceoff Win | 169 | 21 | Backhand | Stephanie Anderson | NA | NA | |||
| 2021-01-23 | Minnesota Whitecaps | Boston Pride | 1 | 19:52 | 5 | 5 | 0 | 0 | Boston Pride | Samantha Davis | Puck Recovery | 159 | 26 | NA | NA |
# Subset to only be shots from the game on 2021-01-23 between the Minnesota
# White Caps and Boston Pride
bdc_shots <- bdc_data[(bdc_data$Event == "Shot") &
(bdc_data$`Home Team` == "Minnesota Whitecaps") &
(bdc_data$game_date == "2021-01-23"), ]
# Separate shots by team
whitecaps_shots <- bdc_shots[bdc_shots$Team == "Minnesota Whitecaps", ]
pride_shots <- bdc_shots[bdc_shots$Team == "Boston Pride", ]# Correct the shot location
whitecaps_shots["x"] <- 200 - whitecaps_shots["x"]
whitecaps_shots["y"] <- 85 - whitecaps_shots["y"]## game_date Home Team Away Team Period Clock Home Team Skaters
## 21 2021-01-23 Minnesota Whitecaps Boston Pride 1 19:21 5
## 30 2021-01-23 Minnesota Whitecaps Boston Pride 1 19:07 5
## 94 2021-01-23 Minnesota Whitecaps Boston Pride 1 16:50 5
## 125 2021-01-23 Minnesota Whitecaps Boston Pride 1 15:49 5
## 133 2021-01-23 Minnesota Whitecaps Boston Pride 1 15:39 5
## 148 2021-01-23 Minnesota Whitecaps Boston Pride 1 15:04 5
## Away Team Skaters Home Team Goals Away Team Goals Team
## 21 5 0 0 Minnesota Whitecaps
## 30 5 0 0 Minnesota Whitecaps
## 94 5 0 0 Minnesota Whitecaps
## 125 5 0 0 Minnesota Whitecaps
## 133 5 0 0 Minnesota Whitecaps
## 148 5 0 0 Minnesota Whitecaps
## Player Event x y Detail 1 Detail 2 Detail 3 Detail 4 Player 2 x2
## 21 Allie Thunstrom Shot 38 83 Snapshot On Net f f NA
## 30 Nina Rodgers Shot 33 67 Snapshot Missed f f NA
## 94 Haylea Schmid Shot 19 42 Snapshot On Net f f NA
## 125 Jonna Curtis Shot 33 18 Snapshot On Net f f NA
## 133 Meaghan Pezon Shot 47 23 Snapshot Missed t f NA
## 148 Lynn Astrup Shot 36 62 Snapshot Missed t f NA
## y2
## 21 NA
## 30 NA
## 94 NA
## 125 NA
## 133 NA
## 148 NA
# Add the shots to the plot
phf_rink +
geom_point(data = whitecaps_shots, aes(x, y), color = "#2251b8") +
geom_point(data = pride_shots, aes(x, y), color = "#fec52e")# Subset the data to be Boston's passes
boston_passes <- bdc_data[(bdc_data$Event == "Play") &
(bdc_data$Team == "Boston Pride") &
(bdc_data$game_date == "2021-01-23"), ]
head(boston_passes)## game_date Home Team Away Team Period Clock Home Team Skaters
## 12 2021-01-23 Minnesota Whitecaps Boston Pride 1 19:40 5
## 18 2021-01-23 Minnesota Whitecaps Boston Pride 1 19:28 5
## 23 2021-01-23 Minnesota Whitecaps Boston Pride 1 19:17 5
## 41 2021-01-23 Minnesota Whitecaps Boston Pride 1 18:39 5
## 43 2021-01-23 Minnesota Whitecaps Boston Pride 1 18:36 5
## 52 2021-01-23 Minnesota Whitecaps Boston Pride 1 18:20 5
## Away Team Skaters Home Team Goals Away Team Goals Team
## 12 5 0 0 Boston Pride
## 18 5 0 0 Boston Pride
## 23 5 0 0 Boston Pride
## 41 5 0 0 Boston Pride
## 43 5 0 0 Boston Pride
## 52 5 0 0 Boston Pride
## Player Event x y Detail 1 Detail 2 Detail 3 Detail 4
## 12 Mallory Souliotis Play 3 49 Direct
## 18 Mallory Souliotis Play 12 33 Direct
## 23 Mallory Souliotis Play 25 3 Indirect
## 41 Taylor Turnquist Play 73 81 Indirect
## 43 Jillian Dempsey Play 135 52 Direct
## 52 Mary Parker Play 168 3 Indirect
## Player 2 x2 y2
## 12 Taylor Wenczkowski 37 70
## 18 Taylor Wenczkowski 61 85
## 23 Samantha Davis 57 3
## 41 Jillian Dempsey 117 75
## 43 McKenna Brand 168 74
## 52 Lexie Laing 199 35
# Plot passes with geom_segment()
phf_rink +
geom_segment(
data = boston_passes,
aes(
x = x,
y = y,
xend = x2,
yend = y2
),
lineend = "round",
linejoin = "round",
color = "#ffcb05"
)From https://sportyr.sportsdataverse.org/articles/animating-tracking-data.html
# Load the play data
example_nfl_play <- data.table::fread(
glue::glue(
"https://raw.githubusercontent.com/sportsdataverse/sportyR/",
"main/data-raw/example-pbp-data.csv"
)
)
# Convert to data frame
example_nfl_play <- as.data.frame(example_nfl_play)## time x y s a dis o dir event nflId
## 1 2018-12-16 18:04:41 33.52 31.02 0.05 0.01 0.04 234.40 238.47 None 80431
## 2 2018-12-16 18:04:41 46.79 29.45 0.55 1.00 0.05 278.70 78.66 None 2506789
## 3 2018-12-16 18:04:41 36.56 27.79 0.05 0.02 0.01 271.21 265.99 None 2532928
## 4 2018-12-16 18:04:41 30.59 14.43 0.00 0.00 0.00 72.27 173.40 None 2543509
## 5 2018-12-16 18:04:41 34.31 35.42 0.14 0.40 0.01 279.12 216.91 None 2543571
## 6 2018-12-16 18:04:41 31.56 18.53 0.00 0.00 0.00 88.63 75.94 None 2550284
## displayName jerseyNumber position frameId team gameId playId
## 1 Clay Matthews 52 OLB 1 away 2018121603 105
## 2 Tramon Williams 38 CB 1 away 2018121603 105
## 3 Eddie Pleasant 35 SS 1 away 2018121603 105
## 4 Allen Robinson 12 WR 1 home 2018121603 105
## 5 Bashaud Breeland 26 CB 1 away 2018121603 105
## 6 Trey Burton 80 TE 1 home 2018121603 105
## playDirection route
## 1 right
## 2 right
## 3 right
## 4 right HITCH
## 5 right
## 6 right IN
## time x y s a dis o dir event nflId
## 1371 2018-12-16 18:04:49 35.47 25.82 4.05 0.68 0.41 91.73 81.84 None 2558008
## 1372 2018-12-16 18:04:49 61.66 1.69 8.46 2.16 0.85 126.74 117.82 None 2558119
## 1373 2018-12-16 18:04:49 54.45 16.98 6.58 1.24 0.67 104.53 118.23 None 2558250
## 1374 2018-12-16 18:04:49 60.30 38.08 5.23 1.61 0.53 141.67 133.82 None 2560755
## 1375 2018-12-16 18:04:49 54.07 5.24 6.16 1.91 0.63 78.14 102.57 None 2560952
## 1376 2018-12-16 18:04:49 63.44 0.50 9.43 1.50 0.95 NA NA None NA
## displayName jerseyNumber position frameId team gameId playId
## 1371 Mitchell Trubisky 10 QB 86 home 2018121603 105
## 1372 Josh Jones 27 SS 86 away 2018121603 105
## 1373 Tarik Cohen 29 RB 86 home 2018121603 105
## 1374 Josh Jackson 37 CB 86 away 2018121603 105
## 1375 Jaire Alexander 23 CB 86 away 2018121603 105
## 1376 Football NA 86 football 2018121603 105
## playDirection route
## 1371 right
## 1372 right
## 1373 right GO
## 1374 right
## 1375 right
## 1376 right
# Prep data for plotting
example_nfl_play[example_nfl_play["team"] == "home", "color"] <- "#c83803"
example_nfl_play[example_nfl_play["team"] == "away", "color"] <- "#ffb612"
example_nfl_play[example_nfl_play["team"] == "football", "color"] <- "#624a2e"# Create the field
nfl_field <- geom_football("nfl", x_trans = 60, y_trans = 26.6667)
# Display the field
nfl_field# Add the points on the field
play_anim <- nfl_field +
geom_point(
data = example_nfl_play,
aes(x, y),
color = example_nfl_play$color
) +
transition_time(example_nfl_play$frameId)
# Show the animation
play_anim